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Abstract — A parallel algorithm for prefix computation reported 
recently on interconnection network called OTIS-Mesh Of 
Trees[4]. Using n 4 processors, algorithm shown to run in 131og 
n + O(l) electronic moves and 2 optical moves for n 4 data 
points. In this paper we present new and improved parallel 
algorithm for prefix on OTIS-Mesh of Trees. The algorithm 
requires lOlog n + O(l) electronic steps + 1 optical step for 
prefix computation on the same number of processors and 
data points as considered in [4]. 

Index Terms — Prefix computation, parallel algorithm, time 
complexity. 

I. INTRODUCTION 

Optical Transpose Interconnection System (OTIS) [1],[2] 
is basically a hybrid architecture which is benefits from optical 
and electronic connections. Optical connection is used to 
connect the processors when the distance between the 
processors exceeds the few millimetres (in other package) 
and electronic connections are used to connect the close 
processors (within the same physical package). Several 
models exploit the idea of optical and electronic connections. 
Given N data values x x ,...,x and the associative binary 
operation o, the prefix problem is to compute P. = 
x ox,ox ox.,ld"id"N. Depending upon the type of problem, 
the associative binary operation o perform arithmetic or 
logical operation. Throughout this paper we assume that 
associative binary operation perform as binary addition 
operation. 

H. RELATED WORK 

Many researchers has developed parallel algorithms for 
computing the prefix on several topologies. An n point 
parallel algorithm on two dimensional mesh with 3"N - 
lcommunication steps can be found in [8]. Egecioglu and 
Srinivas[15] presented 2"N + lcommunication steps and log 
N + 1 arithmetic steps, optimal algorithm also on mesh 
architecture. A parallel algorithm has been reported in [14] on 
an extended Multimesh that requires 13N" 4 communication 
steps(electronic) and log N + 4 arithmetic steps. Wang and 
sahani [7] developed a parallel algorithm on N-point prefix 
computation in (8N 1/4 -l) electronic and 2 OTIS moves for N 
processors on SIMD and MIMD model. Wang and sahani 
also modified the parallel algorithm in same paper and 
completes prefix computation in (7N 1/4 -l) electronic moves 
and 2 OTIS moves. Jana and sinha[19] developed a improved 
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algorithm with (5.5N 1/4 +3) electronic moves and 2 OTIS moves 
on the same model. Jana and mallick developed a parallel 
prefix algorithm on OTIS mesh of trees[4] which can compute 
the prefix of N numbers in 3.51ogN + O(l) electronic steps 
and it takes 2 OTIS moves. In this paper we propose here a 
improved parallel prefix algorithm on a OTIS-Mesh of Trees. 
We shown that the algorithm requires lOlogn + O (1) 
electronic moves and 1 optical move. The algorithm is shown 
to be an improvement over the parallel prefix algorithm on 
OTIS-MOT [4]. The rest of the paper is organized as follows. 
Section III describes the topology of OTIS-Mesh of trees. In 
section IV ,we describe the parallel algorithm for prefix in 
each group as described in MOT[4] in subsection A and 
proposed parallel algorithm in subsection B. 

m. TOPOLOGY 

We describe here the topology of OTIS -MOT in three 
sections. Subsection A describe the topology of Mesh of 
Trees, subsection B describe the OTIS network and 
subsection C describe the computational model on which we 
map the parallel prefix algorithm. 

A. Topology of Mesh OfTrees(MOT) 

In nxn MOT, n 2 processor organized as an nxn lattice. In 
an nxn lattice each processor denotes by p(i,j) where i and 
j respectively are row and column and Id" i , jd" n. Then the 
interconnectivity among the processors is described as 
follows: 
i. The processor in the i th row is connected to form binary tree 

from processors P(i,l). ie. for j=l to [7J processor P(i,j) is 

directly connected to the processor P(i,2j) and P(i,2j+1) 
whether they exists. We call such binary trees as row trees, 
ii. The processor in the j th column is connected to form binary 

tree from processors P(l,j). for i=l to [7J processor P(i,j) is 

directly connected to the processor P(2i , j) and P(2i+1, j) 

whether they exists. We call such binary trees as column 

trees. 

iii. All links are bi-directional. 

The topology of MOT shown in figure 1. 




Figure 1 . 5x5 Mesh Of Trees 
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B. OTIS Network 

Optical Transpose Interconnection System is basically a 
hybrid architecture which is benefits from optical and 
electronic connections. In this topology we use usual 
electronic links for communication between the processors 
within a group, and optical links for communication between 
the processors of other groups. According to the OTIS rule, 
G" 1 group is connected to the P th processor and P" 1 group is 
connected to the G" 1 processor. The pattern of OTIS can be 
varied according to the interconnection of processors within 
a group, such as OTIS-MOT, OTIS-Hypercube, OTIS-Mesh 
etc. The topology of OTIS shown in figure 2. 
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Figure 2. OTIS Network 

C. Computational Model 

In Optical Transpose Interconnection System, processors 
between the different groups are connected by optical links. 
n 4 number of processors are divided in to the n x n lattices 
where each lattice has n 2 processors. The processor placed 
at i th row andj" 1 column within a lattice is denoted by P(i,j,k,l), 
where k and 1 denotes the lattice coordinates. Processors 
within a lattice are connected by the electronic links and 
processors placed at different lattices can be connected by 
optical links. According to the OTIS rule processor P(i,j,k,l) is 
connected to the P(k,l,i,j). Connectivity including electronic 
links and optical links shown in figure 3. 




Figure 3. OTIS-Mesh Of Trees 
All links are not shown and links are bidirectional 



IV. PROPOSED PARALLEL PREFIX ALGORITHM 

In proposed parallel prefix algorithm, first we describe 
parallel prefix algorithm in each group as considered in [4]. 
Then we describe the proposed parallel prefix algorithm on 
OTIS-Mesh of Trees on n 4 data points in section B. 

A. Parallel Prefix in each group 

Data initialization: n 2 data elements are stored in 
respective B(i,j) registers in row major order in each group. 
Register B(i,j) is initialized with the data element x, ,, _ ,., 

° v ' J/ (i-l)n+(j-l)' 

ld"i,jd"n for n =5 as we shown in figure 4. 1 . 
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Figure 4.1 Data initialization on lattice 

step 1: Perform the prefix computation on each row in 
parallel and store the result in A(i,j) register as given in [8]. 
After this step register A(i,j) contains x[(i-l)n:(i-l)n+(j-l)], 
where x[p:q] shows the sum x +x +1 +,...,+x for pd"q , however 
we can see x + x ,+,...,+x is denoted by x in figure 4.2. 
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Xj X« Xj.? Xm Xm 
X|( Ximj X]».|; Xj».| Xim 
X|s Xin( Kiht Xjhi X|f.| S 

X;( Xjm Xjtu Xjtjj X;o.;j 

Figure 4.2 Data in A(ij) register 

step 2:Perform prefix summation in each row tree on the 
contents of B(i,j) register in each row and store it in B(i, 1) of 
the root processor P(i,l). B[i,l] holds x[(i-l)n + (j-1): in-1] 
after this step. 

step 3: Perform modified prefix on the contents of B(i,l) of 
the 1 st column tree using similar procedure as given in [8]. In 
this modified prefix, the processor P[i,l] computes the sum 
x[0:(i-l)n-l] for 2d"id"n andB[i,l] holds 0. The result is shown 
in figure 4.3 where "-" indicates the don't care value. 

Xo. s - 

Xj.n 

Xgis - 
Figure 4.3 Modified Prefix on first column Tree 

step 4: Broadcast the content of B(i,l) on each row as 
shown in figure 4.4. 
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Xfcj X+j Xu X t _i X^j 

"** X« X<j Xj^ X^j, 
A*-u Xd-:j **-*-u ^d-:j ^d-:j 
X*-ib Xfrip X^-jt Ao-it ^H-ia 
Figure 4.4 Data in B(i,j) register after step 4 

step 5: Add the contents of A(i,j) and B(i,j) registers 
forld"i,jd"n and store the result in B(i,j) register forld"i,jd"n 
to produce the final prefix. The content of B(i,j) register after 
step 5 shown in figure 4.5. 

Tif, X[, | X a: "St. 3 X*.j 
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Vj,.]ti Xj.n ^ D . L ; >t .u \(|i 

X».ii Xt.i( Xo.it X(kd Xj.ls 
X» !» X».;| X^« X*»j .\»M 

Figure 4.5 Data in B(i,j) register after step 5 

Time complexity: The stepl and step3 requires logn steps 
each. Step2 and Step4 each require logn steps. Step5 requires 
no data movement. Therefore the above algorithm requires 
41ogn + 0(l) steps. 

B. Parallel Prefix on OTIS-MOT 

Data Initialization. First we store the data elements x x x, 
x 3 x n 4 1 in register B(i,j) following the row major order within 
the block and also in row major order from block to block, i.e 
B(i, j, k, 1) is initialized with x,. ,, 3 ,. „ 2 „ ,, . , for n = 3. 

v ' J ' ' / (i-l)n + Q-l)n + (k-l)n+ (1-1) 

Register C(i,j) is initialized with the value 0. 
step 1: Compute the local prefix of each group stored in the 
register B(i,j) in parallel by applying the algorithm described 
in subsection A of section IV and store the result in register 
B(i,j,k,l) holds x[(i-l)n 3 +(j-l)n 2 :(i-l)n 3 +(j-l)n 2 +(k-l)n+(l-l)] 
which is shown in figure 4.6. 
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Figure 4.6 Data in B(ij) register in each group 
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step 2:After computing local prefix, send the content of 
register B(n,n) to register A(n,n) of each group in parallel . 
step 3: Broadcast the content from register A(n,n) to register 
A(i,j), ld"i,jd"n through mesh of trees of each group in parallel. 
The content of register A(i,j) , ld"i,jd"n for group 12 shown in 
figure 4.7. 

x#.i? *W7 **-■? 

s s .] T S9.1T s».n 

ij.jt 19.17 i»ai 

:::■.. y 12 
Figure 4.7 Content of register A (i,j) for group 12 

Rule I. According to rule 1, processors in any group can 
send the data through valid OTIS links only to the processors 
in the successor groups. We assign group number to groups 
according to which we can decide which group is predecessor 
for any group and which group is successor for any group. 
Distributions of group numbers are in row major order 1 d" g 
d" n 2 where g is group number. Group with the large group 
number will be the successor of group with lower group 
number, for e.g. group 2 is successor of group 1. The 
distribution of group number shown in figure 4.8. 
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Figure 4.6 Data in B(ij) register in each group 

step 2:After computing local prefix, send the content of 
register B(n,n) to register A(n,n) of each group in parallel . 
step 3: Broadcast the content from register A(n,n) to register 
A(i,j), ld"i,jd"n through mesh of trees of each group in parallel. 
The content of register A(i,j) , ld"i,jd"n for group 12 shown in 
figure 4.7. 
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Figure 4.7 Content of register A (ij) for group 12 
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Rule 1. According to rule 1, processors in any group can 
send the data through valid OTIS links only to the processors 
in the successor groups. We assign group number to groups 
according to which we can decide which group is predecessor 
for any group and which group is successor for any group. 
Distributions of group numbers are in row major order 
1 ^ g ^ n- 3 vhere g is group number. Group with the large 
group number will be the successor of group with lower group 
number, for e.g. group 2 is successor of group 1. The 
distribution of group number shown in figure 4.8. 
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Figure 4.8 Distribution of group numbers 

step 4: After step 3 processors in each group sends the 
content from A(i,j) register, to C(i,j) register of processors of 
only successor groups, in parallel through OTIS links as 
considered in rule l.The OTIS move from A(i,j) to C(i,j) register 
shown in figure 4.9. 
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step 5: Perform prefix computation on each row on C(i,j) 
register of each group in parallel as given in [8] and store the 
result in C(i,j) register for l<ij<n. 

step 6. After step 5 perform prefix computation on C(i,j) register 
of last column of each group in parallel as given in [8]. The 
content of C(i,j) registers of each group after step 6 is shown 
in figure 4. 10. 
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Figure 4.10 Content of C(i,j) register after step 6 

step7: Broadcast the content from C(n,n) register to C(i,j) 
register for l<i j<n of each group in parallel. The content 
of C(i,j) register of group 2 shown in fig 4. 1 1 after step 7. 
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Figure 4.1 1 content of register C(ij) for group 2 

step 8: Finally add the content of C(i,j) and B(i,j) registers for 
1< ij £n of each srout) in parallel and store the result in 
B(i,j) register for l^lj^tl- The final result of prefix 

computation is shown in figure 4. 12. 
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Figure 4.9 OTIS move from A(i,j) to C(i,j). 

All moves are not shown and links are bidirectional 
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Figure 4.12 Data on B (i,j) register of each group 



step 9: Stop 



RESULTS 



We describe here the time complexity required to map the 
parallel algorithm on OTIS-MOT on n 4 processors. 
Time complexity: The algorithm described in subsection A of 
section IV requires 41ogn + O(l) time for computing local 
prefix in each group in parallel as considered in MOT[4]. In 
our proposed algorithm, described in subsection B of section 
IV, Step2 takes O(l) time complexity. Step 3 takes 21ogn time 
complexity for broadcasting. Step 4 requires single OTIS move 
.Step5 takes log n step for calculating prefix at each row of 
each group in parallel. Step6 takes logn time complexity. Step7 
takes 21ogn time complexity for broadcast. Step8 requires no 
data movement. Therefore the above algorithm requires lOlog 
n + O(l) electronic moves + 1 OTIS optical move. 
For N= n 4 above algorithm requires 2.51ogN + O(l) electronic 
move + 1 OTIS optical move. 

COMPARATIVE RESULT ANALYSIS 

In Table I we compare the time complexity in terms of 
electronic moves and optical moves required by the parallel 
prefix algorithms mapped on OTIS network. 

TABLE I. 
COMPARISON OF OTIS-BASED PARALLEL ALGORITHM 



Algorithm 


Electronic Moves 


OTIS Moves 


WanE and sahani( 7] 


7n-l 




Jana and Sinha[ 1 9] 


i.^n+3 




Dh?5r5shand Jana[4] 


IMoEn + Oa) 
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r:-:p;: tI r..zi ::::;;■:; 


1 01o en + 0(1) 
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CONCLUSIONS 

In this paper we have presented a new parallel prefix 
algorithm on OTIS-Mesh of Trees on a set of n 4 data points. 
The algorithm perform parallel prefix on OTIS-Mesh of trees 
in lOlog n + O(l) electronic moves + 1 Optical move. This can 
be compared to earlier parallel prefix algorithm on OTIS-Mesh 
of Trees[4] on n 4 data points which requires 131og n + O(l) 
electronic moves + 2 optical move on same number of 
processors. 

FUTURE WORKS 

In future works we should try to map the parallel prefix 
algorithm on other interconnection networks so that we can 
perform parallel prefix computation in reduced time 
complexity. We can also propose a new interconnection 
network for mapping the parallel prefix algorithm. 
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